plots_top<-tail(skills_count,10)

darkcols <- brewer.pal(8,"Dark2")
names <- plots_top$Skills
barplot(plots_top$Total,main="Indeed Counts", horiz=TRUE, names.arg=names, las=1, col=darkcols, cex.axis=0.5, cex.names = 0.5)

top10_skills<-skills_city[1:10,]
ggplot(top10_skills, aes(x=Skills, y=Total, colour= City, size = Total)) + geom_point()

library(wordcloud)
wordcloud(skills_count$Skills,skills_count$Total, random.order=FALSE, colors=brewer.pal(8,"Dark2"))
## Warning in wordcloud(skills_count$Skills, skills_count$Total, random.order
## = FALSE, : Machine Learning could not be fit on page. It will not be
## plotted.

Drilling down on the Data Scientist jobs in NY. Lets look at a horizontal bar chart of all skills with type indicated by the bar’s color.

ny_indeed$key_words <- factor(ny_indeed$key_words, levels = unique(ny_indeed$key_words)[order(ny_indeed$count, decreasing = F)])
m <- list(
  l = 100,
  r = 100,
  b = 100,
  t = 100,
  pad = 4
)
key_word_plot <- plot_ly(data = ny_indeed, x= ~count, y = ~key_words, type = 'bar', orientation = 'h', color = ~type) %>% 
  layout(title='Skills Required of Data Scientists in NY')

key_word_plot

Now lets look at which type of skill was mentioned the most in job descriptions by plotting the aggregated data.

grpd$type <- factor(grpd$type, levels = unique(grpd$type)[order(grpd$sum_by_type, decreasing = F)])
sum_by_type <- plot_ly(data = grpd, x=~sum_by_type, y=~type, type = 'bar', orientation = 'h', color = ~type) %>% 
  layout(title='NY Skills by Type')

sum_by_type  

Conclusion

Our findings show that many skills are required of a Data Scientist. We learned that some of the top hard skills required are Python, Machine Learning, Big Data, SQL, Excel, and R. As for soft skills, a Data Scientist is expected to communicate and have managerial experience. Mathemetics or math was also the most frequently mentioned key word in all of NY Data Scientist job postings. However, from our NY data we cannot definitively conclude whether one type of skill is significantly more important than another. Even though we have counted more mentions of hard skills than soft or education requirements, our search included more key words for hard skills than either of the other types, so this result should be expected. Most importantly, we learned that a Data Scientist is required to be well rounded, with a strong higher education and both soft and hard skills to ensure they can get the job done.